From Norms to Neuropsychopathology: Exploring Neuroanatomical & Neuropsychiatric Variation in Alzheimer’s Disease Through Normative Modeling
This study extends analyses conducted by Verdi et al. (2023), who utilized normative modeling techniques to delineate neuroanatomical heterogeneity in Alzheimer’s disease. We employ a similar methodological framework to supplement these insights by integrating both structural MRI data and neuropsychiatric symptom profiles. This integration allowed us to explore more deeply the neuroanatomical and neuropsychiatric underpinnings of Alzheimer’s disease across different phenotypes within our ADNI subset. This study was conducted by the Applied Neuroimaging Statistics Ressearch Laboratory at McLean Hospital & Harvard Medical School.
Author: Adrián Medina
Date: October 4, 2024
Project Overview
Analytic Background: Our study data were a subset derived from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) ADNI3 wave data bank. This was due to the harmonization of scanner sequence protocols across data collection sites that began during this wave. We only included subjects that had both structural MRI and PET (amyloid) data collected, followed by QA of these data. This analysis leveraged a normative model developed by the Predictive Clinical Neuroscience Group at the Donders Institute and Radboud UMC, which aims to predict and stratify brain disorders on the basis of neuroimaging data. Specifically, we used ‘HBR_lifespan_36K_79sites’, which makes use of the Hierarchical Bayesian Regression algorithm trained on 37,128 subjects from 79 different collection sites, across the human lifespan.
Considerations:
Ideally, both adaptation and testing sets would be balanced by age, sex, and site (covariates) following something like a 60/40 or 70/30 split of healthy controls. However, given our limited sample size, we decided to keep all of our healthy control and patient data isolated.
In our analysis: The adaptation set is used to calibrate for site (onlyhealthy controls) while the testing set is used exclusively for patient-phenotyped data. Healthy control phenotypes include ‘A-C-’ (amyloid negative, cognitive impairment negative) & ‘A+C-’ (amyloid positive, cognitive impairment negative).
Patient-phenotypes include ‘A+C+’ (amyloid positive, cognitive impairment positive) & ‘A-C+’ (amyloid negative, cognitive impairment positive). As a consequence of both a smaller sample size & larger site numbers (59 sites), our group elected to utilize the MRI manufacturer (3 total) of the subject’s imaging data to act as a pseudo site variable thus giving more power to viably calibrate for potential “site” influences.
- 1 = GE
- 2 = Philips
- 3 = Siemens
Some manual edits may be needed to match your imaging variables to the column headers listed in the normative model CSV template (listed in the Normative Modeling GUI). Specifically, go to Compute here! menu bar > Data type, select ThickAvg > Normative Model, select HBR_lifespan_36K_79sites > wait a moment, and the site will populate the line “Download template csv file here.”
Upon completion of this step, you are then able to run your adaptation and testing data sets through the normative model deviation scores computation via either the Normative Modeling GUI or Braincharts GUI if you want to recreate the Python code yourself.
Set up the R environment for neuroimaging data analysis by specifying CRAN and ggseg repositories and managing package installations with pacman. Load essential libraries and set a working directory to access neuroimaging and demographic datasets, which are then read and prepared for analysis, focusing on group cortical thickness comparisons and deviation score analyses. Rename the ‘BioMarkers’ variable to ‘phenotype’ across datasets for consistent terminology.
Tip
Front-loading the reading-in of all datasets being used in comprehensive analyses at the beginning of your script ensures that the data environment is consistently prepared, which simplifies the replication and validation process for other users.
Expand Code
# Set CRAN repository for consistent package installation, ggseg for neuroimaging visualizationsoptions(repos =c(ggseg ='https://ggseg.r-universe.dev',CRAN ='https://cloud.r-project.org'))# Install and load the pacman package for efficient package managementif (!require(pacman)) install.packages("pacman")library(pacman)# Use p_load function to install (if necessary) and load packagesp_load(dplyr, knitr, kableExtra, psych, tidyr, readr, stringr, matrixStats, ggseg, ggseg3d, ggsegExtra, ggsegDesterieux, cowplot, data.table, e1071, ggplot2, plot.matrix, proxy, RPMG, broom, gridExtra, patchwork, caret, tidyverse, fastDummies, sjPlot, ggbeeswarm, lavaan, caTools, bitops, effects, ggeffects, reshape2, ggpubr)# Specify the 'base_path' where you can find your data files, ASSUMING they're in the same directory, & set it as WDbase_path <-"~/GitHub/ADNI_NormativeModeling/data_files"setwd(base_path)# Read in foundational data setADNI_274_Merged <-read_csv("ADNI_274_Merged.csv")### FreeSurfer variables were edited manually to match PCN template### Covariates: 'age', 'sex', 'site'# Read in finalized file, to be used ONLY for group cortical thickness comparisons (not deviation score analyses)df_merged <-read_csv("ADNI_274_Merged_Final.csv")# Read in computed deviation scores (the file you get back after uploading adaptation and testing sets)df <-read_csv("ADNI_deviation_scores_BLR.csv")# Read in PDC deviation data setsPDC_HBR <-read_csv("PDC_HBR.csv")PDC_BLR <-read_csv("PDC_BLR.csv")# Rename biomarker variable to phenotypeADNI_274_Merged <- dplyr::rename(ADNI_274_Merged, phenotype ="BioMarkers")df_merged <- dplyr::rename(df_merged, phenotype ="BioMarkers")df <- dplyr::rename(df, phenotype ="BioMarkers")
1
For reference, this merged data file is composed of two data sets: NIDP_DX_Dem_NP_MRI.csv contains clinical/non-imaging variables and MRI manufacturer information, & ADNI_lh_rh_thickness_subcort_volumes.csv contains the FreeSurfer brain thickness measures.
2
See details under the important callout in the Code Workflow section.
3
This should be the spreadsheet that will be split into adaptation and testing sets for normative modeling.
4
This dataset contains subject FreeSurfer variable deviation scores for your TESTING set used in the Normative Modeling GUI/code.
Load and prepare Destrieux atlas data for visualization and analysis. The data from the ggseg package, corrected for the correct spelling of Destrieux, is assigned to variables for easy access and plotted to verify the structure and detail of the atlas features and labeled regions.
Caution
The ggseg package adds an extra ‘e’ in the spelling of the name Destrieux, so it’s NOT a typo.
Expand Code
setwd(base_path)# Assign atlas data to data frames in local environmentdesterieux_dims <- desterieux$datadesterieux <- desterieux# Plot simple atlas features to test data frame with dimension dataplot(desterieux_dims) +theme(legend.position ="bottom",legend.text =element_text(size =7)) +guides(fill =guide_legend(ncol =3))
NULL
Expand Code
# Plot atlas ROIs with labels to test data frame with complete vectorsggplot() +geom_brain(atlas = desterieux)
Covariate Distributions, Data Splitting, & Balance Verification
Visualize and analyze the distribution of MRI manufacturers and models within the dataset after renaming the biomarker variable to phenotype. Bar plots display counts of MRI manufacturers and models grouped by manufacturer, with additional cross-tabulation bar plots showing the number of subjects by phenotype across different sites.
Expand Code
# Create a bar plot for MRI manufacturers with counts displayedggplot(ADNI_274_Merged, aes(x = MRI_MANU)) +geom_bar(stat ="count", fill ="skyblue", color ="black") +geom_text(stat ="count", aes(label = ..count..), vjust =-0.5) +labs(title ="Count of MRI Manufacturers", x ="MRI Manufacturer", y ="Count") +theme_minimal()
Expand Code
# Create separate bar plots for each manufacturerggplot(ADNI_274_Merged, aes(x = MRI_MAKE, fill = MRI_MANU)) +geom_bar(position ="dodge") +geom_text(aes(label = ..count..), stat ="count", position =position_dodge(width =0.9), vjust =-0.5) +facet_wrap(~MRI_MANU, scales ="free_x") +labs(title ="Counts of Each MRI Make Grouped by Manufacturer", x ="MRI Make", y ="Count") +theme_minimal() +theme(axis.text.x =element_text(angle =45, hjust =1)) # Rotate x-axis labels for better visibility
Expand Code
# Counting subjects by phenotype within each sitephenotype_site_counts <- df_merged %>%group_by(site, phenotype) %>% dplyr::summarise(Count =n(), .groups ='drop')# Plot with Site on x-axisplot_site_x <-ggplot(phenotype_site_counts, aes(x = site, y = Count, fill = phenotype)) +geom_bar(stat ="identity", position =position_dodge()) +geom_text(aes(label = Count), vjust =-0.5, position =position_dodge(width =0.9)) +labs(title ="Subject Counts by Phenotype Across Sites",x ="Site",y ="Number of Subjects",fill ="Phenotype") +theme_minimal() +theme(axis.text.x =element_text(angle =0, hjust =1))# Plot with Phenotype on x-axisplot_phenotype_x <-ggplot(phenotype_site_counts, aes(x = phenotype, y = Count, fill =factor(site))) +geom_bar(stat ="identity", position =position_dodge()) +# Use position_dodge() for proper groupinggeom_text(aes(label = Count), vjust =-0.5, position =position_dodge(width =0.9)) +# Adjust text positioninglabs(title ="Subject Counts Across Phenotype by Site",x ="Phenotype",y ="Number of Subjects",fill ="Site") +scale_fill_brewer(palette ="Set1") +# Using a qualitative palette for distinct sitestheme_minimal() +theme(axis.text.x =element_text(angle =0, hjust =1)) # Improve readability of x-axis labels# Optionally, print both plots to view them in the R environmentprint(plot_site_x)
Expand Code
print(plot_phenotype_x)
Create adaptation and testing datasets from the merged data based on specific biomarker criteria, setting aside the groups for separate analyses. The datasets are saved as CSV files, which will be used for further normative model computations after ensuring that the data are balanced in terms of covariates and grouping variables.
Expand Code
# Separate the data based on 'phenotype'adaptation_data <- df_merged %>%filter(phenotype %in%c("A-C-", "A+C-"))testing_data <- df_merged %>%filter(phenotype %in%c("A-C+", "A+C+"))# Save the adaptation and testing datasets as CSV files, un-comment when verified# write_csv(adaptation_data, "ADNI_175_Adaptation.csv")# write_csv(testing_data, "ADNI_99_Testing.csv")
Note
After verifying that the grouping variable and covariates are balanced as desired (i.e., see the next chunk), you can now run them through the normative model computation via the GUI or recreate their code yourself!
Generate visualizations to assess the balance of biomarkers, site, and sex distributions in adaptation and testing datasets, and calculate the average age for each dataset. These plots and summaries ensure the datasets are appropriately balanced before further normative model computations.
Expand Code
# Plotting the distribution of 'BioMarkers' in both datasetsbiomarker_distribution <-bind_rows(mutate(adaptation_data, dataset ="Adaptation"),mutate(testing_data, dataset ="Testing")) %>%group_by(dataset, phenotype) %>% dplyr::summarise(Count =n(), .groups ='drop') %>%group_by(dataset) %>%mutate(Percentage = (Count /sum(Count)) *100)ggplot(biomarker_distribution, aes(x = phenotype, y = Percentage, fill = dataset)) +geom_bar(stat ="identity", position =position_dodge()) +labs(title ="Distribution of Biomarker", x ="Biomarker", y ="Percentage (%)") +theme_minimal()
Expand Code
# Plotting the distribution of 'site' in both datasetssite_distribution <-bind_rows(mutate(adaptation_data, dataset ="Adaptation"),mutate(testing_data, dataset ="Testing")) %>%group_by(dataset, site) %>% dplyr::summarise(Count =n(), .groups ='drop') %>%group_by(dataset) %>%mutate(Percentage = (Count /sum(Count)) *100)ggplot(site_distribution, aes(x = site, y = Percentage, fill = dataset)) +geom_bar(stat ="identity", position =position_dodge()) +labs(title ="Distribution of Site", x ="Site", y ="Percentage (%)") +theme_minimal()
Expand Code
# Plotting the distribution of 'sex' in both datasetssex_distribution <-bind_rows(mutate(adaptation_data, dataset ="Adaptation"),mutate(testing_data, dataset ="Testing")) %>%group_by(dataset, sex) %>% dplyr::summarise(Count =n(), .groups ='drop') %>%group_by(dataset) %>%mutate(Percentage = (Count /sum(Count)) *100)ggplot(sex_distribution, aes(x = sex, y = Percentage, fill = dataset)) +geom_bar(stat ="identity", position =position_dodge()) +labs(title ="Distribution of Sex", x ="Sex", y ="Percentage (%)") +theme_minimal()
Expand Code
# Calculating and comparing the average ageage_summary <-bind_rows(mutate(adaptation_data, dataset ="Adaptation"),mutate(testing_data, dataset ="Testing")) %>%group_by(dataset) %>% dplyr::summarise(Average_Age =mean(age, na.rm =TRUE), .groups ='drop')print(age_summary)
Reorganize and recode key variables in the dataset for group cortical thickness comparisons: set the column order for clarity, recode the ‘phenotype’ variable to categorical levels representing health conditions, and standardize categorical labels for ‘site’ and ‘sex’ to enhance readability and analysis consistency.
Expand Code
# Specifying the order of the first few columns for easier viewingdesired_order <-c("Subject_ID", "age", "sex", "site", "phenotype")# Append the remaining column names that are not specified in desired_orderremaining_cols <-setdiff(names(df_merged), desired_order)# Combine the specified order with the remaining columnsnew_order <-c(desired_order, remaining_cols)# Reorder the columns in df according to new_orderdf_merged <- df_merged[, new_order]# Recode phenotype variabledf_merged$phenotype <-recode(df_merged$phenotype,"A-C-"="HC","A+C-"="HC","A-C+"="MCI","A+C+"="AD")df_merged$phenotype <-as.factor(df_merged$phenotype)df_merged$phenotype <-factor(df_merged$phenotype, levels =c("HC", "MCI", "AD")) # explicitly set the reference level# Recode site variabledf_merged$site <-gsub("1", "GE", df_merged$site)df_merged$site <-gsub("2", "Philips", df_merged$site)df_merged$site <-gsub("3", "Siemens", df_merged$site)df_merged$site <-as.factor(df_merged$site)# Recode sex variable, 0 = females 1 = malesdf_merged$sex <-gsub("0", "Female", df_merged$sex)df_merged$sex <-gsub("1", "Male", df_merged$sex)df_merged$sex <-factor(df_merged$sex)
Demographic Descriptives and Statistical Analysis of Cortical Thickness
Generate descriptive statistics and visualize the distribution of key variables such as phenotype, age, and sex within the dataset. Employ cross-tabulation to analyze mean cortical thickness and age by phenotype group, and present these distributions using violin and bar plots. Additionally, compute summary statistics for mean cortical thickness and visually represent them using rain cloud plots to highlight differences across phenotype groups.
Expand Code
# Count of each variable of interesttable(df_merged$phenotype)
HC MCI AD
175 40 59
Expand Code
table(df_merged$sex)
Female Male
153 121
Expand Code
table(df_merged$site)
GE Philips Siemens
49 48 177
Expand Code
# Cross-Tabulation of average thickness and phenotype groupdescribeBy(df_merged$Mean_Thickness, df_merged$phenotype)
Descriptive statistics by group
group: HC
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 175 2.43 0.08 2.43 2.43 0.07 2.17 2.65 0.48 -0.15 0.12 0.01
------------------------------------------------------------
group: MCI
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 40 2.4 0.08 2.41 2.4 0.07 2.24 2.59 0.35 -0.1 -0.28 0.01
------------------------------------------------------------
group: AD
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 59 2.35 0.09 2.34 2.35 0.1 2.16 2.55 0.39 -0.08 -0.83 0.01
Expand Code
d<-(describeBy(df_merged$Mean_Thickness, df_merged$phenotype))# Cross-Tabulation of age and phenotype groupdescribeBy(df_merged$age, df_merged$phenotype)
Descriptive statistics by group
group: HC
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 175 70.1 5.94 69 69.81 5.93 55 90 35 0.53 0.66 0.45
------------------------------------------------------------
group: MCI
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 40 71.97 8.81 72 72.16 8.9 56 88 32 -0.22 -0.87 1.39
------------------------------------------------------------
group: AD
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 59 72.25 7.73 73 72.43 7.41 55 89 34 -0.19 -0.33 1.01
Expand Code
mean(df_merged$age)
[1] 70.83942
Expand Code
sd(df_merged$age)
[1] 6.871648
Expand Code
# Generate the violin plot of age stratified by phenotype categoriesggplot(df_merged, aes(x = phenotype, y = age, fill = phenotype)) +geom_violin() +labs(title ="Distribution of Age Stratified by Phenotype Groups",x ="Phenotype Group", y ="Age") +theme_minimal() +theme(axis.text.x =element_text(angle =0, hjust =1)) # Improve readability of x-axis labels
Expand Code
# Cross-Tabulation of sex & phenotype, visualizationggplot(df_merged, aes(x = phenotype, fill = sex)) +geom_bar(position ="dodge") +labs(title ="Distribution of Sex Across Phenotype Groups",x ="Phenotype Group",y ="Count",fill ="Sex") +theme_minimal()
Expand Code
# Create the summary Mean_Thickness data framefiltered_MT_summary <- df_merged %>%group_by(phenotype) %>% dplyr::summarise(score_mean =mean(Mean_Thickness, na.rm =TRUE),n =n(), # Sample size for each groupsem =sd(Mean_Thickness, na.rm =TRUE) /sqrt(n()),.groups ='drop' )# Create the rain cloud plot with separate componentsggplot(df_merged, aes(x = phenotype, y = Mean_Thickness, fill = phenotype, color = phenotype)) + PupillometryR::geom_flat_violin(adjust =1.5, trim =FALSE, alpha =0.5, position =position_nudge(x =0.2, y =0), colour =NA) +geom_point(aes(x =as.numeric(phenotype)-0.25, y = Mean_Thickness, colour = phenotype), position =position_jitter(width = .05), size = .5, shape =20) +geom_boxplot(outlier.shape =NA, alpha =0.5, width =0.25, position =position_dodge(width =0.3), colour ="black") +geom_point(data = filtered_MT_summary, aes(x =factor(phenotype), y = score_mean), shape =18, position =position_nudge(x =0.2)) +geom_errorbar(data = filtered_MT_summary, aes(x =factor(phenotype), y = score_mean, ymin = score_mean - sem, ymax = score_mean + sem), width =0.05, position =position_nudge(x =0.2)) +labs(title ="Distribution of Mean Cortical Thickness Stratified by Phenotype Groups", y ="Score", x ="Phenotype", fill ="Phenotype Group", # Legend title for fillcolor ="Phenotype Group"# Legend title for color ) +theme_minimal() +theme(legend.position ="right",legend.title =element_text(face ="bold") # Make legend title bold )
Model the impact of phenotype on mean cortical thickness using linear regression, adjusting for covariates such as age and sex. Visualize these relationships and conduct Tukey’s HSD tests to identify significant differences between phenotype groups, presenting results in both tabular form and through a significance plot of the Tukey HSD test outcomes.
Expand Code
# Fit a linear model of Mean_Thickness as a function of phenotype and store it in s_2s_2 <-lm(Mean_Thickness ~ phenotype, data = df_merged)# Create a table for the linear model with confidence intervalstab_model(s_2, title ="Linear Model: Average Cortical Thickness as a Function of Phenotype")
Linear Model: Average Cortical Thickness as a Function of Phenotype
Mean Thickness
Predictors
Estimates
CI
p
(Intercept)
2.43
2.42 – 2.44
<0.001
phenotype [MCI]
-0.03
-0.06 – 0.00
0.079
phenotype [AD]
-0.08
-0.10 – -0.05
<0.001
Observations
274
R2 / R2 adjusted
0.126 / 0.119
Expand Code
# Perform Tukey's Honest Significant Difference test and store resultstukey_results_s2 <-TukeyHSD(aov(Mean_Thickness ~ phenotype, data = df_merged))print(tukey_results_s2)
Tukey multiple comparisons of means 95% family-wise confidence level
Fit: aov(formula = Mean_Thickness ~ phenotype, data = df_merged)
tukey_data_s2 <-as.data.frame(tukey_results_s2$phenotype)tukey_data_s2$Comparison <-rownames(tukey_data_s2)tukey_data_s2$significant <-ifelse(tukey_data_s2$`p adj`<0.05, "Significant", "Not Significant")# Creating the plot of the Tukey HSD test with significance indicationggplot(tukey_data_s2, aes(y = Comparison, xmin = lwr, xmax = upr, x = diff)) +geom_vline(xintercept =0, linetype ="dashed", color ="black") +geom_errorbarh(aes(height =0.2, color = significant)) +geom_point(aes(x = diff, color = significant), size =2) +scale_color_manual(values =c("Significant"="red", "Not Significant"="blue")) +labs(title ="Tukey HSD Test Results for Mean Cortical Thickness by Phenotype",x ="Differences in Mean Levels of Phenotype",y ="Comparison",color ="P-value Significance") +theme_minimal()
Expand Code
# Fit a linear model of Mean_Thickness as a function of phenotype with added covariatess_1<-lm(Mean_Thickness ~ phenotype + age + sex, data = df_merged)# Create a table for the linear model including covariates with confidence intervalstab_model(s_1, title ="Enhanced Linear Model: Average Cortical Thickness as a Function of Phenotype, Adjusted for Age and Sex")
Enhanced Linear Model: Average Cortical Thickness as a Function of Phenotype, Adjusted for Age and Sex
Mean Thickness
Predictors
Estimates
CI
p
(Intercept)
2.58
2.48 – 2.68
<0.001
phenotype [MCI]
-0.02
-0.05 – 0.01
0.221
phenotype [AD]
-0.07
-0.10 – -0.05
<0.001
age
-0.00
-0.00 – -0.00
0.007
sex [Male]
-0.02
-0.04 – 0.00
0.119
Observations
274
R2 / R2 adjusted
0.159 / 0.147
Expand Code
# Plot multiple regression modelggpredict(s_1, c(terms ="age", "phenotype", "sex")) |>plot() +labs(title ="Average Cortical Thickness as a Function of Phenotype, Adjusted for Age and Sex",x ="Age",y ="Mean Cortical Thickness (mm)",caption ="Note: 'phenotype' factors, 'HC' = Healthy Control; 'MCI' = Mild Cognitive Impairment; & 'AD' = Alzheimer's Disease.")
Conduct ANOVA to assess differences across phenotype groups in brain regions’ FreeSurfer volume measures, then apply False Discovery Rate (FDR) corrections to p-values and calculate F-statistics. Finally, visualize significant regions using a heatmap, categorizing results by significance levels, and incorporating annotations for clearer demarcation of significance groups.
Expand Code
# Apply suffix to all ROI FreeSurfer measuresdf.rois <- df_merged %>%rename_at(vars((6:153)), ~paste0(., '_rois'))# Run ANOVA analyses for each ROI across phenotype group and extract just the p-valuesdf.stats <-as.data.frame(sapply(X = df.rois[,grep("_rois", names(df.rois),value = T)], FUN =function(x) summary(aov(x ~ df.rois$phenotype))[[1]][["Pr(>F)"]][1]))# Rename columns in ANOVA output data framenames(df.stats) <-"p_value"setDT(df.stats, keep.rownames ="ROI")# Verify that changes were made via the first 20 ROIshead(df.stats, 20)
### Repeat steps to now obtain the F-stat values# Run ANOVA analyses for each ROI across phenotype group and extract just the F-stat valuesdf.f_stats <-as.data.frame(sapply(X = df.rois[,grep("_rois", names(df.rois),value = T)], FUN =function(x) summary(aov(x ~ df.rois$phenotype))[[1]][["F value"]][1]))# Rename columns in ANOVA output data framenames(df.f_stats) <-"f_stat"setDT(df.f_stats, keep.rownames ="ROI")# Verify that changes were made via the first 20 ROIshead(df.f_stats, 20)
# Assign significance levelsdf.stats$Significance <-ifelse(df.stats$FDR.pvalue <0.001, '***',ifelse(df.stats$FDR.pvalue <0.01, '**',ifelse(df.stats$FDR.pvalue <0.05, '*', '')))# Merge data frames for visualizationdf.stats_merged <-merge(df.stats, df.f_stats, by ="ROI")# Melt the data for plottingdf.melted <-melt(df.stats_merged, id.vars ="ROI", variable.name ="Statistic", value.name ="Value")# Ensure that 'Value' is numericdf.melted$Value <-as.numeric(as.character(df.melted$Value))# Ensure df.stats and df.melted are merged to filter and sortdf.melted <- df.melted %>%filter(Statistic =="f_stat") %>%left_join(df.stats %>%select(ROI, Significance, FDR.pvalue), by ="ROI") %>%filter(FDR.pvalue <0.05) %>%arrange(match(Significance, c("", "*", "**", "***"))) # Order by significance levels# Remove suffix from ROI listdf.melted$ROI <-gsub("_rois", "", df.melted$ROI)# Define group positions and labelsgroup_positions <-data.frame(start =c(1, 15, 30), # will need to adjust depending on your own resultsend =c(14, 29, 51), # will need to adjust depending on your own resultslabel =c("*", "**", "***") # example labels for significance)# Create the heatmapheatmap_plot <-ggplot(df.melted, aes(x = ROI, y = Statistic, fill = Value)) +geom_tile() +scale_fill_gradient(low ="blue", high ="red") +labs(title ="Heatmap of ANOVA Statistics Across ROIs", x ="Region of Interest", y ="F-Statistic") +theme_minimal() +theme(axis.text.x =element_text(angle =90, hjust =1))# Add custom brackets using annotationsfor (i in1:nrow(group_positions)) { heatmap_plot <- heatmap_plot +annotate("segment", x = group_positions$start[i], xend = group_positions$end[i], y =Inf, yend =Inf, yjust =1.5, color ="black", size =0.5) +annotate("text", x = (group_positions$start[i] + group_positions$end[i]) /2, y =Inf, label = group_positions$label[i], vjust =2, hjust =0.5)}# Print the cutomized heatmapprint(heatmap_plot)
Adjust brain region names in the data to align with the Destrieux atlas nomenclature for both left and right hemispheres, performing extensive renaming to match the standard naming convention. This ensures the region names are compatible with established neuroimaging atlas formats for subsequent analyses.
Expand Code
# Remove suffix from ROI listdf.stats$ROI <-as.data.frame(gsub("_rois", "", df.stats$ROI))### Rename ROIs to assign nomenclature consistent with the 'desterieux' atlas package# Left hemisphere ROIsleft.df.stats <- df.stats %>%filter(str_detect(ROI, "L_"))x <-gsub("L_", "", left.df.stats$ROI)y <-gsub("G&S_", "G and S ", x)z <-gsub("G_", "G ", y)z1 <-gsub("S_", "S ", z)z2 <-gsub("_", " ", z1)z3 <-gsub(" bin", "", z2)z4 <-gsub("\\.", " ", z3)z5 <-gsub("cingul ", "cingul-", z4)z6 <-gsub("Mid ", "Mid-", z5)z7 <-gsub("Post ", "Post-", z6)z8 <-gsub("inf ", "inf-", z7)z9 <-gsub("med ", "med-", z8)z10 <-gsub("sup ", "sup-", z9)z11 <-gsub("Fis ant ", "Fis-ant-", z10)z12 <-gsub("precentral ", "precentral-", z11)z13 <-gsub("Fis pos ", "Fis-pos-", z12)z14 <-gsub("lg&S", "lg and S", z13)z15 <-gsub("oc temp", "oc-temp", z14)z16 <-gsub("sup and transversal", "sup-transversal", z15)z17 <-gsub("orbital H Shaped", "orbital-H Shaped", z16)z18 <-gsub("oc sup&transversal", "oc sup and transversal", z17)z19 <-gsub("prim Jensen", "prim-Jensen", z18)z20 <-gsub("S oc-temp med&Lingual", "S oc-temp med and Lingual", z19)z21 <-gsub("lat fusifor", "lat-fusifor", z20)z22 <-gsub("middle&Lunatus", "middle and Lunatus", z21)z23 <-gsub("intrapariet&P trans", "intrapariet and P trans", z22)renamed_ROIs <-gsub("Lat Fis post", "Lat Fis-post", z23)desterieux_ROIs <-as.data.frame(desterieux_dims %>%filter(hemi =="left"))$regioncompare_lists <-cbind(sort(renamed_ROIs), sort(unique(desterieux_ROIs)))list_matches <- compare_lists[,1] %in% compare_lists[,2]compare_lists[!list_matches,]
[,1] [,2]
Expand Code
## if no mismatches, than add to data.frame as regionleft.df.stats$region <- renamed_ROIs# Right hemisphere ROIsright.df.stats <- df.stats %>%filter(str_detect(ROI, "R_"))x <-gsub("R_", "", right.df.stats$ROI)y <-gsub("G&S_", "G and S ", x)z <-gsub("G_", "G ", y)z1 <-gsub("S_", "S ", z)z2 <-gsub("_", " ", z1)z3 <-gsub(" bin", "", z2)z4 <-gsub("\\.", " ", z3)z5 <-gsub("cingul ", "cingul-", z4)z6 <-gsub("Mid ", "Mid-", z5)z7 <-gsub("Post ", "Post-", z6)z8 <-gsub("inf ", "inf-", z7)z9 <-gsub("med ", "med-", z8)z10 <-gsub("sup ", "sup-", z9)z11 <-gsub("Fis ant ", "Fis-ant-", z10)z12 <-gsub("precentral ", "precentral-", z11)z13 <-gsub("Fis pos ", "Fis-pos-", z12)z14 <-gsub("lg&S", "lg and S", z13)z15 <-gsub("oc temp", "oc-temp", z14)z16 <-gsub("sup and transversal", "sup-transversal", z15)z17 <-gsub("orbital H Shaped", "orbital-H Shaped", z16)z18 <-gsub("oc sup&transversal", "oc sup and transversal", z17)z19 <-gsub("prim Jensen", "prim-Jensen", z18)z20 <-gsub("S oc-temp med&Lingual", "S oc-temp med and Lingual", z19)z21 <-gsub("lat fusifor", "lat-fusifor", z20)z22 <-gsub("middle&Lunatus", "middle and Lunatus", z21)z23 <-gsub("intrapariet&P trans", "intrapariet and P trans", z22)renamed_ROIs <-gsub("Lat Fis post", "Lat Fis-post", z23)desterieux_ROIs <-as.data.frame(desterieux_dims %>%filter(hemi =="right"))$regioncompare_lists <-cbind(sort(renamed_ROIs), sort(unique(desterieux_ROIs)))list_matches <- compare_lists[,1] %in% compare_lists[,2]compare_lists[!list_matches,]
[,1] [,2]
Expand Code
## if no mismatches, than add to data.frame as regionright.df.stats$region <- renamed_ROIs
Visualize significant FDR-corrected p-values for brain regions across groups using gradient-filled maps for both left and right hemispheres, highlighted with different colors to represent statistical significance levels. These plots are organized in a grid layout to facilitate comparison between hemispheres.
Conduct pairwise statistical comparisons between controls and Alzheimer’s disease patients on cortical thickness measures. Perform t-tests on each brain region, followed by FDR corrections to adjust for multiple comparisons, organizing the results into a clean data frame with significance levels marked.
Expand Code
# Data frame preparationdf.CA <- df_merged[grep("HC|AD", df_merged$phenotype), ] # Subset just Controls and Alzheimer's ("CA")# Perform t-test and extract t-stat value, p-value, and parametersdf.CA_stats <-as.data.frame(sapply(df.CA[6:153], function(x) t.test(x ~ df.CA$phenotype)$statistic))df.CA_stats1 <-as.data.frame(sapply(df.CA[6:153], function(x) t.test(x ~ df.CA$phenotype)$p.value))df.CA_stats2 <-as.data.frame(sapply(df.CA[6:153], function(x) t.test(x ~ df.CA$phenotype)$parameter))# Merge data frames and remove the subsequent, intermediate data framesdf.CA_stats <-cbind(df.CA_stats, df.CA_stats1)df.CA_stats <-cbind(df.CA_stats, df.CA_stats2)rm(df.CA_stats1, df.CA_stats2)# Clean up the t-test data framedf.CA_stats$t_statistic <- df.CA_stats[2] # Rename columnnames(df.CA_stats) <-c('statistic', 'p.value', 'parameter')df.CA_stats[4] <-NULL# Perform FDR correction on t-test data framedf.CA_stats <-cbind(df.CA_stats, p.adjust(df.CA_stats$p.value), method ="fdr")names(df.CA_stats)[4] <-"FDR.pvalue"df.CA_stats[5] <-NULL# Clean up the FDR-corrected data framedf.CA_stats <- tibble::rownames_to_column(df.CA_stats, "ROI") # Create ROI columndf.CA_stats$ROI <-gsub("\\.t$", "", df.CA_stats$ROI)# List the ROIs that are significant after FDR correctiondf.CA_stats_sig <- df.CA_stats[which(df.CA_stats$FDR.pvalue <0.05),]
Conduct t-tests between control and MCI groups, extracting t-statistics and p-values for brain regions, followed by FDR correction to account for multiple comparisons, leading to a cleaned and organized presentation of significant regions.
Expand Code
# Data frame preparationdf.CM <- df_merged[grep("HC|MCI", df_merged$phenotype), ] # Subset just Controls and MCI ("CM")# Perform t-test and extract t-stat value, p-value, and parametersdf.CM_stats <-as.data.frame(sapply(df.CM[6:153], function(x) t.test(x ~ df.CM$phenotype)$statistic))df.CM_stats1 <-as.data.frame(sapply(df.CM[6:153], function(x) t.test(x ~ df.CM$phenotype)$p.value))df.CM_stats2 <-as.data.frame(sapply(df.CM[6:153], function(x) t.test(x ~ df.CM$phenotype)$parameter))# Merge data frames and remove the subsequent, intermediate data framesdf.CM_stats <-cbind(df.CM_stats, df.CM_stats1)df.CM_stats <-cbind(df.CM_stats, df.CM_stats2)rm(df.CM_stats1, df.CM_stats2)# Clean up the t-test data framedf.CM_stats$t_statistic <- df.CM_stats[2] # Rename columnnames(df.CM_stats) <-c('statistic', 'p.value', 'parameter')df.CM_stats[4] <-NULL# Perform FDR correction on t-test data framedf.CM_stats <-cbind(df.CM_stats, p.adjust(df.CM_stats$p.value), method ="fdr")names(df.CM_stats)[4] <-"FDR.pvalue"df.CM_stats[5] <-NULL# Clean up the FDR-corrected data framedf.CM_stats <- tibble::rownames_to_column(df.CM_stats, "ROI") # Create ROI columndf.CM_stats$ROI <-gsub("\\.t$", "", df.CM_stats$ROI)# List the ROIs that are significant after FDR correctiondf.CM_stats_sig <- df.CM_stats[which(df.CM_stats$FDR.pvalue <0.05),]
Perform pairwise t-tests between MCI and Alzheimer’s groups, extracting critical statistics for comparative analysis, apply FDR corrections for multiple testing, and combine significant results to visualize in a lollipop plot, differentiating groups with colors and annotations for significance levels.
Expand Code
# Data frame preparationdf.MA <- df_merged[grep("MCI|AD", df_merged$phenotype), ] # Subset just MCI and Alzheimer's ("MA")### MCI vs Alzheimer's t-test# Perform t-test and extract t-stat value, p-value, and parametersdf.MA_stats <-as.data.frame(sapply(df.MA[6:153], function(x) t.test(x ~ df.MA$phenotype)$statistic))df.MA_stats1 <-as.data.frame(sapply(df.MA[6:153], function(x) t.test(x ~ df.MA$phenotype)$p.value))df.MA_stats2 <-as.data.frame(sapply(df.MA[6:153], function(x) t.test(x ~ df.MA$phenotype)$parameter))# Merge data frames and remove the subsequent, intermediate data framesdf.MA_stats <-cbind(df.MA_stats, df.MA_stats1)df.MA_stats <-cbind(df.MA_stats, df.MA_stats2)rm(df.MA_stats1, df.MA_stats2)# Clean up the t-test data framedf.MA_stats$t_statistic <- df.MA_stats[2] # Rename columnnames(df.MA_stats) <-c('statistic', 'p.value', 'parameter')df.MA_stats[4] <-NULL# Perform FDR correction on t-test data framedf.MA_stats <-cbind(df.MA_stats, p.adjust(df.MA_stats$p.value), method ="fdr")names(df.MA_stats)[4] <-"FDR.pvalue"df.MA_stats[5] <-NULL# Clean up the FDR-corrected data framedf.MA_stats <- tibble::rownames_to_column(df.MA_stats, "ROI") # Create ROI columndf.MA_stats$ROI <-gsub("\\.t$", "", df.MA_stats$ROI)# List the ROIs that are significant after FDR correctiondf.MA_stats_sig <- df.MA_stats[which(df.MA_stats$FDR.pvalue <0.05),]# Combine the significant results for easy plottingcombined_stats_sig <-bind_rows( df.CA_stats_sig %>%mutate(Comparison ="HC vs AD"), df.MA_stats_sig %>%mutate(Comparison ="MCI vs AD"))# Assign significance levelscombined_stats_sig$Significance <-ifelse(combined_stats_sig$FDR.pvalue <0.001, '***',ifelse(combined_stats_sig$FDR.pvalue <0.01, '**',ifelse(combined_stats_sig$FDR.pvalue <0.05, '*', '')))# Create the lollipop plotggplot(combined_stats_sig, aes(x =reorder(ROI, statistic), y = statistic, color = Comparison)) +geom_segment(aes(yend =0, xend = ROI), size =1, linetype ="dashed") +geom_point(size =3) +labs(title ="Significant Differences in Brain Regions Across Comparisons",x ="Region of Interest",y ="T-Statistic Value",color ="Comparison",caption ="Note: 'HC' = Healthy Control; 'MCI' = Mild Cognitive Impairment; & 'AD' = Alzheimer's Disease.") +theme_minimal() +theme(legend.position ="top",axis.text.x =element_text(angle =90, hjust =1, size =7.5) # Rotate x labels for better visibility )
Define a function to rename region of interest (ROI) labels for brain imaging data to align with the Destrieux atlas nomenclature. Adjust ROI names based on hemisphere, applying systematic replacements to match atlas standards, and then check for mismatches against the atlas database, allowing for application across multiple datasets and hemispheres.
Expand Code
# Rename ROIs to assign nomenclature consistent with the desterieux atlas packagerename_rois <-function(df, hemi, desterieux_dims) {# Filter based on hemisphere hemi_prefix <-ifelse(hemi =="left", "L_", "R_") df_filtered <- df %>%filter(str_detect(ROI, hemi_prefix))# Perform renaming steps x <-gsub(paste0(hemi_prefix), "", df_filtered$ROI) patterns <-c("G&S_", "G_", "S_", "_", " bin", "\\.", "cingul ", "Mid ", "Post ", "inf ", "med ", "sup ", "Fis ant ", "precentral ", "Fis pos ", "lg&S", "oc temp", "sup and transversal", "orbital H Shaped", "oc sup&transversal", "prim Jensen", "S oc-temp med&Lingual", "lat fusifor", "middle&Lunatus", "intrapariet&P trans", "Lat Fis post") replacements <-c("G and S ", "G ", "S ", " ", "", " ", "cingul-", "Mid-", "Post-", "inf-", "med-", "sup-", "Fis-ant-", "precentral-", "Fis-pos-", "lg and S", "oc-temp", "sup-transversal", "orbital-H Shaped", "oc sup and transversal", "prim-Jensen", "S oc-temp med and Lingual", "lat-fusifor", "middle and Lunatus", "intrapariet and P trans", "Lat Fis-post")for (i inseq_along(patterns)) { x <-gsub(patterns[i], replacements[i], x) }# Match with desterieux ROIs desterieux_ROIs <-as.data.frame(desterieux_dims %>%filter(hemi == hemi))$region compare_lists <-cbind(sort(x), sort(unique(desterieux_ROIs))) list_matches <- compare_lists[,1] %in% compare_lists[,2]if (any(!list_matches)) {warning("There are mismatches in ROI names for the ", hemi, " hemisphere.") } df_filtered$region <- xreturn(df_filtered)}# Apply the function to each dataset and hemisphereleft.df.CA_stats <-rename_rois(df.CA_stats, "left", desterieux_dims)right.df.CA_stats <-rename_rois(df.CA_stats, "right", desterieux_dims)left.df.CM_stats <-rename_rois(df.CM_stats, "left", desterieux_dims)right.df.CM_stats <-rename_rois(df.CM_stats, "right", desterieux_dims)left.df.MA_stats <-rename_rois(df.MA_stats, "left", desterieux_dims)right.df.MA_stats <-rename_rois(df.MA_stats, "right", desterieux_dims)
1
Automated loop version
Visualize the significance of FDR-corrected p-values for pairwise comparisons (Control vs. Alzheimer’s, Control vs. MCI, and MCI vs. Alzheimer’s) across both hemispheres using the Destrieux atlas. Generate separate gradient-filled plots for each comparison and hemisphere, labeling each to reflect the specific groups compared. Display these plots using a grid layout to assess differences effectively.
Expand Code
### Control vs Alzheimer's# Left hemisphereleft_CA_pvalues <-ggseg(.data=left.df.CA_stats, atlas = desterieux, mapping=aes(fill=FDR.pvalue), hemisphere ="left", colour ="white", size =0.2) +scale_fill_gradientn(limits =c(0,0.05), colours =rainbow.colors(5))# Right hemisphereright_CA_pvalues <-ggseg(.data=right.df.CA_stats, atlas = desterieux, mapping=aes(fill=FDR.pvalue), hemisphere ="right", colour ="white", size =0.2) +scale_fill_gradientn(limits =c(0,0.05), colours =rainbow.colors(5))### Control vs MCI# Left hemisphereleft_CM_pvalues <-ggseg(.data=left.df.CM_stats, atlas = desterieux, mapping=aes(fill=FDR.pvalue), hemisphere ="left", colour ="white", size =0.2) +scale_fill_gradientn(limits =c(0,0.05), colours =rainbow.colors(5))# Right hemisphereright_CM_pvalues <-ggseg(.data=right.df.CM_stats, atlas = desterieux, mapping=aes(fill=FDR.pvalue), hemisphere ="right", colour ="white", size =0.2) +scale_fill_gradientn(limits =c(0,0.05), colours =rainbow.colors(5))### MCI vs Alzheimer's# Left hemisphereleft_MA_pvalues <-ggseg(.data=left.df.MA_stats, atlas = desterieux, mapping=aes(fill=FDR.pvalue), hemisphere ="left", colour ="white", size =0.2) +scale_fill_gradientn(limits =c(0,0.05), colours =rainbow.colors(5))# Right hemisphereright_MA_pvalues <-ggseg(.data=right.df.MA_stats, atlas = desterieux, mapping=aes(fill=FDR.pvalue), hemisphere ="right", colour ="white", size =0.2) +scale_fill_gradientn(limits =c(0,0.05), colours =rainbow.colors(5))# Add titles to individual plotsleft_CA_pvalues <- left_CA_pvalues +labs(title ="Healthy Controls vs Alzheimer's Group")right_CA_pvalues <- right_CA_pvalues +labs(title ="Healthy Controls vs Alzheimer's Group")left_CM_pvalues <- left_CM_pvalues +labs(title ="Healthy Controls vs MCI Group")right_CM_pvalues <- right_CM_pvalues +labs(title ="Healthy Controls vs MCI Group")left_MA_pvalues <- left_MA_pvalues +labs(title ="MCI Group vs Alzheimer's Group")right_MA_pvalues <- right_MA_pvalues +labs(title ="MCI Group vs Alzheimer's Group")# Print plots to verify that they were correctly constructedcowplot::plot_grid(left_CA_pvalues, right_CA_pvalues, nrow =2)
Transform and reorganize analytic variables for comprehensive analyses by converting the ‘phenotype’ column into dummy variables, removing superfluous columns, and systematically renaming and reordering columns for clarity. This process ensures data columns align more effectively with analytical needs, including adjusting categorical variables for models and ensuring ROI names are standardized without duplicates.
Expand Code
# Use dummy_cols to create dummy variables for the 'phenotype' columndf <-dummy_cols(df, select_columns ="phenotype", remove_first_dummy =TRUE, remove_selected_columns =TRUE)# Rename dummy variable column back to phenotypedf <- dplyr::rename(df, phenotype ="phenotype_A+C+")# Omit the 'index' column from the data frame, unnecessary variabledf <- df[, -which(names(df) =="index")]# Specifying the order of the first few columns for easier viewing (and the list of columns to exclude from the ROIs)desired_order <-c("Subject_ID", "age", "sex", "site", "phenotype")# Append the remaining column names that are not specified in desired_orderremaining_cols <-setdiff(names(df), desired_order)# Combine the specified order with the remaining columnsnew_order <-c(desired_order, remaining_cols)# Reorder the columns in df according to new_orderdf <- df[, new_order]# Extract column names, remove those in desired_order, and remove z-score suffixesROI <-names(df)[!names(df) %in% desired_order &!grepl("_Z_predict", names(df))]# Further clean the ROI names to ensure no duplicates from z-score namesROI <-unique(gsub("_Z_predict", "", ROI))# Recode phenotype variabledf$phenotype <-gsub("0", "MCI", df$phenotype)df$phenotype <-gsub("1", "AD", df$phenotype)df$phenotype <-as.factor(df$phenotype)df$phenotype <-factor(df$phenotype, levels =c("MCI", "AD")) # explicitly set the reference level# Recode site variabledf$site <-gsub("1", "GE", df$site)df$site <-gsub("2", "Philips", df$site)df$site <-gsub("3", "Siemens", df$site)df$site <-as.factor(df$site)# Recode sex variable, 0= females 1= malesdf$sex <-gsub("0", "Female", df$sex)df$sex <-gsub("1", "Male", df$sex)df$sex <-factor(df$sex)
Identify and quantify outliers in brain region deviation scores by applying a defined statistical threshold, creating a binary representation for outlier detection, and summing these to obtain a total outlier score for each subject. The process includes visualizing these scores by phenotype and performing a basic regression analysis to examine the relationship between phenotypes and outlier scores, with additional summarization of the outlier distributions across phenotypes.
Expand Code
# Establish outlier thresholdoutlier_threshold <--1.96# bottom 2.5%, as used in the Verdi et al., 2023 paper# Apply threshold to z-score data to create dummy columnsdf3 <-as.data.frame(ifelse(df[,6:153] < outlier_threshold,1,0))# Rename all binarized columns to have the suffix "_bin"df3 <- df3 %>%rename_all(paste0, "_bin")# Sum the dummy column values to create a total outlier scoredf$total_outlier_score <-rowSums(df3)# Merge the two data framesdf <-cbind(df, df3)# Remove df3 from your R environment as it's no longer neededrm(df3)# Create a group difference box plot with colored outlines and filled pointsggplot(df, aes(x = phenotype, y = total_outlier_score, color = phenotype, fill = phenotype)) +geom_boxplot(outlier.shape =NA, fill =NA, size =1) +# Draw box plots with no fill, only outlinesgeom_jitter(width =0.2, size =2, shape =21) +# Add jittered points with the same colorlabs(x ="Phenotype", y ="Total Outlier Score") +theme_minimal() +scale_color_manual(values=c("MCI"="mediumturquoise", "AD"="lightslateblue")) +# Color outline by phenotypescale_fill_manual(values=c("MCI"="mediumturquoise", "AD"="lightslateblue")) +# Fill points by phenotypetheme(legend.position ="right")
Expand Code
# Calculate descriptives of total outlier score grouped by phenotype groupdescribeBy(df$total_outlier_score, df$phenotype, IQR = T)
Descriptive statistics by group
group: MCI
vars n mean sd median trimmed mad min max range skew kurtosis se IQR
X1 1 40 5.3 6.79 2 4 2.97 0 28 28 1.59 1.78 1.07 7
------------------------------------------------------------
group: AD
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 59 11.83 15.46 6 8.9 7.41 0 66 66 1.85 2.86 2.01
IQR
X1 12.5
Expand Code
# Basic regression analysis with phenoype as predictor and total outlier score as outcomes1<-lm(total_outlier_score ~ phenotype, data = df)tab_model(s1, show.ci =0.95, title ="Linear Model: Total Outlier Score as a Function of Phenotype")
Linear Model: Total Outlier Score as a Function of Phenotype
total outlier score
Predictors
Estimates
CI
p
(Intercept)
5.30
1.31 – 9.29
0.010
phenotype [AD]
6.53
1.37 – 11.70
0.014
Observations
99
R2 / R2 adjusted
0.061 / 0.051
ADNI vs PDC Preliminary Analyses
Evaluate correlations between brain region deviation scores derived from hierarchical versus linear Bayesian regression models by merging relevant datasets and calculating Pearson correlation coefficients for predefined brain region subsets. The code handles large data by segmenting regions into smaller subsets to prevent memory overload and applies the Benjamini-Hochberg procedure to correct for multiple comparisons. The results are visualized using scatter and lollipop plots to highlight significant correlations across the brain regions, categorizing significance levels based on adjusted p-values.
Note
The Benjamini-Hochberg (BH) correction is employed in this analysis to adjust for multiple comparisons while maintaining a balance between identifying true associations and limiting false discoveries. This method controls the False Discovery Rate, which is particularly useful in large datasets with many hypotheses. Unlike more conservative methods, such as Bonferroni, which control for all types of errors collectively, the BH correction focuses on the proportion of errors among the rejected hypotheses, enhancing the ability to detect genuine effects without overly increasing the risk of Type I errors. This makes it highly suitable for complex data where tests are not independent, such as in correlation studies involving multiple regions or variables.
# Process each subset of brain regionsfor (brain_regions in brain_regions_list) { correlation_results <-list()for(region in brain_regions) { region_hbr <-paste(region, ".HBR", sep ="") region_blr <-paste(region, ".BLR", sep ="")# Ensure both variables are present in the datasetif(region_hbr %in%names(merged_data) && region_blr %in%names(merged_data)) {# Use cor.test to get both correlation and p-value test_result <-cor.test(merged_data[[region_hbr]], merged_data[[region_blr]], method ="pearson", use ="complete.obs")# Store results in a list correlation_results[[region]] <-list(Correlation = test_result$estimate,P_Value = test_result$p.value ) } }# Add subset results to the master list all_correlation_results <-c(all_correlation_results, correlation_results)}# Convert the list to a data frame for easy viewingcorrelation_df <-data.frame(Region =names(all_correlation_results),Correlation =sapply(all_correlation_results, function(x) x$Correlation),P_Value =sapply(all_correlation_results, function(x) x$P_Value),row.names =NULL)# Apply the Benjamini-Hochberg correction to the p-valuescorrelation_df$Adjusted_P_Value <-p.adjust(correlation_df$P_Value, method ="BH")# Print the resultsprint(correlation_df)
# Convert the data to a suitable format for ggplotcorrelation_df$Region <-factor(correlation_df$Region, levels = correlation_df$Region)# Create a scatter plot for each region showing the correlation values# Assuming 'merged_data' contains corresponding columns for HBR and BLR scoresp <-ggplot(correlation_df, aes(x = Region, y = Correlation)) +geom_point(stat ="identity") +ylim(0.8, 1) +labs(title ="Correlation of Deviation Scores Across Brain Regions",x ="Brain Region", y ="Correlation Coefficient") +theme_minimal() +theme(axis.text.x =element_text(angle =90, hjust =1, vjust =0.5, size =5.5))# Print the plotprint(p)
Expand Code
# Adding a new column for significance level based on p-valuescorrelation_df$Significance <-ifelse(correlation_df$Adjusted_P_Value <0.001, '***',ifelse(correlation_df$Adjusted_P_Value <0.01, '**',ifelse(correlation_df$Adjusted_P_Value <0.05, '*', 'ns')))# Create the lollipop plotggplot(correlation_df, aes(x =reorder(Region, Correlation), y = Correlation, color = Significance)) +geom_segment(aes(xend = Region, yend =0.8), size =0.25) +# Create lines from 0 to the correlation valuegeom_point(size =1) +# Add points for each correlation valuescale_color_manual(values =c('***'='red', '**'='orange', '*'='blue', 'ns'='grey')) +# Assign colors based on significancelabs(title ="Correlation and Significance of Brain Regions: Hierarchical vs. Linear Bayesian Regression",x ="Brain Region",y ="Correlation Coefficient",color ="Significance Level" ) +theme_minimal() +theme(legend.position ="top", # Move the legend to the top of the plotaxis.text.x =element_text(angle =90, hjust =1, vjust =0.5, size =5.5), # Improve readability of x-axis labelslegend.title =element_text(face ="bold") # Bold the legend title for emphasis )
Session Information
To enhance reproducibility, details about the working environment used for these analyses can be found below.